Arbitrary Precision Arithmetic - SIMD Style
نویسندگان
چکیده
Current day general purpose processors have been enhanced with what is called " media instruction set " t o achieve performance gains in applications that are media processing intensive. The instruction set that have been added exploit the fact that media applications have small native datatypes and have widths much less than that supported by commercial processors and the plethora of data-parallelism in such applications. Current processors enhanced with the " media instruction set " support arithmetic on sub-datatypes of only &bit, 16-bit7 32-bit and 64-bit precision. In this paper we motivate the need f o r arbitrary precision packed arithmetic wherein the width of the sub-datatypes are pro-grammable by the user and propose an implementation f o r arithmetic o n such packed datatypes. The proposed scheme has marginal hardware overhead over conventional implementations of arithmetic o n processors incorporating a multimedia extended instruction set.
منابع مشابه
Integer and Rational Arithmetic on MasPar
The speed of integer and rational arithmetic increases significantly by systolic implementation on a SIMD architecture. For multiplication of integers one obtains linear speed-up (up to 29 times), using a serial{parallel scheme. A two-dimensional algorithm for multiplication of polynomials gives half-linear speed-up (up to 383 times). We also implement multiprecision rational arithmetic using k...
متن کاملEfficient arithmetic on ARM-NEON and its application for high-speed RSA implementation
Advanced modern processors support Single Instruction Multiple Data (SIMD) instructions (e.g. Intel-AVX, ARM-NEON) and a massive body of research on vector-parallel implementations of modular arithmetic, which are crucial components for modern public-key cryptography ranging from RSA, ElGamal, DSA and ECC, have been conducted. In this paper, we introduce a novel Double Operand Scanning (DOS) me...
متن کاملMontgomery Multiplication on the Cell
A technique to speed up Montgomery multiplication targeted at the Synergistic Processor Elements (SPE) of the Cell Broadband Engine is proposed. The technique consists of splitting a number into four consecutive parts. These parts are placed one by one in each of the four element positions of a vector, representing columns in a 4-SIMD organization. This representation enables arithmetic to be p...
متن کاملMontgomery Modular Multiplication on ARM-NEON Revisited
Montgomery modular multiplication constitutes the “arithmetic foundation” of modern public-key cryptography with applications ranging from RSA, DSA and Diffie-Hellman over elliptic curve schemes to pairing-based cryptosystems. The increased prevalence of SIMD-type instructions in commodity processors (e.g. Intel SSE, ARM NEON) has initiated a massive body of research on vector-parallel implemen...
متن کاملKestrel: Design of an 8-bit SIMD Parallel Processor
Kestrel is a high-performance programmable parallel co-processor. Its design is the result of examination and reexamination of algorithmic, architectural, packaging, and silicon design issues, and the interrelations between them. The nal system features a linear array of 8-bit processing elements, each with local memory, an arithmetic logic unit (ALU), a multiplier, and other functional units. ...
متن کامل